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Abstract. We study weak convergence of empirical processes of dependent data (Xi)i>o, 
indexed by classes of functions. Our results are especially suitable for data arising from 
dynamical systems and Markov chains, where the central limit theorem for partial sums 
of observables is commonly derived via the spectral gap technique. We are specifically 
interested in situations where the index class J- is different from the class of functions / 
for which we have good properties of the observables (/(^Q))i>o- We introduce a new 
bracketing number to measure the size of the index class J- which fits this setting. Our 
results apply to the empirical process of data (-Xj)j>o satisfying a multiple mixing condition. 
This includes dynamical systems and Markov chains, if the Perron-Frobenius operator or 
the Markov operator has a spectral gap, but also extends beyond this class, e.g. to ergodic 
torus automorphisms. 



1. Introduction 

Let (Xj)j> be a stationary stochastic process of IR-valued random variables with marginal 
distribution /x. We denote the empirical measure of order n by \i n = X^Li^Q- The 
classical empirical process is defined by U n (t) = v / n(// n ((— oo, t]) — oo, t])), i 6 1. In 
the case of i.i.d. processes, the limit behavior of the empirical process was first investigated 



by Donsker (1952), who proved that (U n (t)) t ^R converges weakly to a Brownian bridge 



process. This result, known as Donsker 's empirical process central limit theorem, confirmed 



a conjecture of Doob (1949) who had observed that certain functionals of the empirical 
process converge in distribution towards the corresponding functionals of a Brownian bridge. 
Donsker's empirical process CLT has been generalized to dependent data by many authors. 



One of the earliest results is Billingsley (1968), who considered functions of mixing processes, 
with an application to the empirical distribution of the remainders in a continued fraction 
expansion. 

Empirical processes play a very important role in large sample statistical inference. Many 
statistical estimators and test statistics can be expressed as functionals of the empirical 
distribution. As a result, their asymptotic distribution can often be derived from empirical 
process limit theorems, combined with the continuous mapping theorem or a functional 
delta method. A well-known example is the Kolmogorov-Smirnov goodness-of-fit test, which 
uses the test statistic D n := sup tgR \fn\^ n {{— oo, t}) — /x ((— oo,£])| in order to test the null 
hypothesis that /xq is the marginal distribution of X\. Under the null hypothesis, the limit 
distribution of D n is given by the supremum of the Gaussian limit of the empirical process. 
Another example are Von-Mises-statistics, also known as V-statistics. These are defined as 
Ki := ^2 ^2i<ij< n h(Xt, Xj), where h(x, y) is a symmetric kernel function. Specific examples 
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include the sample variance and Gini's mean difference, where the kernel functions are given 
by (x — y) 2 /2 and \x — y\, respectively, ^-statistics can be expressed as integrals with 
respect to the empirical distribution function, namely V n — J J h(x,y)dfj, n (x)dfj, n (y)- The 
asymptotic distribution of V n can then be derived via a functional delta method from an 
empirical process central limit theorem; see e.g. Beutner and Zahle (2012) for some recent 
results. 

Empirical process CLTs for Revalued i.i.d. data (JQ)j>o have first been studied by Dudley 
(1966), Neuhaus (1971), Bickel and Wichura ( |1971 ) and Straf| (1972). These authors consider 
the classical <i-dimensional empirical process y/n(fi n ((— oo, t}) — oo, t])), where ( — oo, t] 



{x e K. : Xi < ti, . . . < to}, t G Mr, denotes the semi-infinite rectangle in Mr. Philipp 



and Pinzur (1980), Philipp ( 1984| ) and Dhompongsa ( |1984 ) studied weak convergence of the 



multivariate empirical process in the case of mixing data. 



Dudley (1978) initiated the study of empirical processes indexed by classes of sets, or 



more generally by classes of functions. This approach allows the study of empirical processes 
for very general data, not necessarily having values in Euclidean space. CLTs for empirical 
processes indexed by classes of functions require entropy conditions on the size of the index 
set. For i.i.d. data, Dudley ( 1978 ) obtained the CLT for empirical processes indexed by classes 



of sets satisfying an entropy condition with inclusion. Ossiander (1987) used an entropy 
condition with bracketing to obtain results for empirical processes indexed by classes of 
functions. For the theory of empirical processes of i.i.d. data, indexed by classes of functions, 
see the book by van der Vaart and Wellner (1996 ). Limit theorems for more general empirical 
processes indexed by classes of functions have also been studied under entropy conditions 
for general covering numbers, e.g. by Nolan and Pollard (1987) who investigate empirical 
[/-processes. 

In the case of strongly mixing data, Andrews and Pollard ( 1994 ) were the first to obtain 



CLTs for empirical processes indexed by classes of functions. Doukhan, Massart, and Rio 



(1995) and Rio (1998) study empirical processes for absolutely regular data. Borovkova 



Burton, and Dehling (2001) investigate the empirical process and the empirical [/-process 



for data that can be represented as functionals of absolutely regular processes. For further 



results, see the survey article by Dehling and Philipp (2002 ), the book by Dedecker, Doukhan, 



Lang, Leon R., Louhichi, and Prieur (2007), as well as the paper by Dedecker and Prieur 



( 12007} ■ 

A lot of research has been devoted to the study of statistical properties of data aris- 
ing from dynamical systems or from Markov chains. A very powerful technique to prove 
CLTs and other limit theorems is the spectral gap method, using spectral properties of the 
Perron- Frobenius operator or the Markov operator on an appropriate space of functions; see 
Hennion and Herve (2001). When the space of functions under consideration contains the 
class of indicator functions of intervals, standard tools can be used to establish the classical 
empirical process CLT. Finite-dimensional convergence of the empirical process follows from 
the CLT for ^ " =1 l(- 00i fi(At), and tightness ca n be established using moment bounds for 



l(s,t]PQ- Collet, Martinez, and Schmitt (2004) used this approach to establish the 



empirical process CLT for expanding maps of the unit interval. 

The situation differs markedly when the CLT and moment bounds are not directly avail- 
able for the index class of the empirical process, but only for a different class of functions. 
Recently, Dehling, Durieu, and Volny (2009) developed techniques to cover such situations. 
They were able to prove classical empirical process CLTs for M-valued data when the CLT 
and moment bounds are only available for Lipschitz functions. Dehling and Durieu (2011) 
extended these techniques to IR d - valued data satisfying a multiple mixing condition for Holder 
continuous functions. Under this condition, they proved the CLT for the empirical process 
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indexed by semi-infinite rectangles (— oo, t], t E M d . The multiple mixing condition is strictly 
weaker than the spectral gap condition. E.g., ergodic torus automorphisms satisfy a multi- 



ple mixing condition, while generally they do not have a spectral gap. Dehling and Durieu 



(2011) proved the empirical process CLT for ergodic torus automorphisms. Durieu and 



Tusche (2012) provide very general conditions under which the classical empirical process 
CLT for Revalued data holds. 

The above mentioned papers study exclusively classical empirical processes, indexed by 
semi-infinite intervals or rectangles. It is the goal of the present paper to extend the tech- 



niques developed by Dehling et al. (2009) to empirical processes indexed by classes of func- 
tions. Let (X, A) be a measurable space, let (Xj)j> be a stationary process of A'-valued 
random variables, and let J 7 be a uniformly bounded class of real-valued functions on X. We 
consider the ^-indexed empirical process (-^ Yli=i(f(Xi) — E/(Xi)))/ e jr. As in the above 
mentioned papers, we will assume that there exists some Banach space B of functions on 
X such that the CLT and a moment bound hold for partial sums ^2™ = i g{Xi), for all g in 
some subset of £>; see Assumptions 1 and 2. These conditions are satisfied, e.g. when the 
Perron-Frobenius operator or the Markov operator acting on B has a spectral gap. Again, if 
the index class T is a subset of B, standard techniques for proving empirical process CLTs 
can be applied. In many examples, however, B is some class of regular functions, while J 7 is 
a class of indicators of sets. It is the goal of the present paper to provide techniques suitable 
for this situation. 

Empirical process invariance principles require a control on the size of the index class 



J 7 , as measured by covering or bracketing numbers; see e.g. van der Vaart and Wellner 



(1996). In this paper, we will consider coverings of T by £>-brackets, i.e. brackets bounded 
by functions l,u 6 B. Because of the specific character of our moment bounds, we have to 
impose conditions on the S-norms of I and u. We will thus introduce a notion of bracketing 
numbers by counting how many i3-brackets of a given L s -size and with a given control on 
the S-norms of the upper and lower functions are needed to cover J 7 . The main theorem of 
the present paper establishes an empirical process CLT under an integral condition on this 
bracketing number. 

This paper is organized as follows: Section [2] contains precise definitions as well as the 
statement of the main theorem. In Section |3j we will specifically consider the case when B 
is the space of Holder continuous functions. We will give examples of classes of functions 
which satisfy the bracketing number assumption. In Section 111 we will give applications to 



ergodic torus automorphisms which extend the empirical process CLT of Dehling and Durieu 



(2011) to more general classes of sets. Section 5 contains the proof of our main theorem 



while proofs of technical aspects of the examples can be found in the appendix. 

2. Main Result 

Let (X, A) be a measurable space, and let pTj)^ be an X-valued stationary stochastic 
process with marginal distribution /i. Let J 7 be a uniformly bounded class of real-valued 
measurable functions defined on X. If Q is a signed measure on (X, A), we use the notation 
Qf = f x fdQ. We define the map F n : T — > R, induced by the empirical measure, 

1 n 

= -£/(**)■ 

i=l 

The J-"-indexed empirical process of order n is given by 

1 n 

U n (f) = >/n(F„(/) -rf) = — J2(f(*i) ~ A*/), / e 
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We regard the empirical process (U n (f))f £ jr as random element on ^(J 7 ); this holds as T 
is supposed to be uniformly bounded. ^(J 7 ) is equipped with the supremum norm and 
the Borel cr-field generated by the open sets. It is well known that, in general, (U n (f))f e jr 
is not measurable and thus the usual theory of weak convergence of random variables does 
not apply. We use here the theory which is based on convergence of outer expectations; see 
van der Vaart and Wellner (1996). Given a Borel probability measure L on ^(J 7 ), we say 



that (U n (f)) n >i converges in distribution to L if 

E*{<p(U n )) -> J v{x)dL{x), 

for all bounded and continuous functions ip : — > K. Here E* denotes the outer 
integral. Note that K*(X) = K(X*), where X* denotes the measurable cover function of X; 
see Lemma 1.2.1 in van der Vaart and Wellner (1996). 

In what follows, we will frequently make two assumptions concerning the process (/(Xj)) ig N, 
where / : X — > R belongs to some Banach space (B, \\ ■ ||g) of measurable functions on X, 
respectively to some subset Q C B. The precise choice of B, as well as of Q, will depend on 
the specific example. Often, we take B to be the space of all Lipschitz or Holder continuous 
functions, and Q the intersection of B with an £°°(A')-ball. 

Assumption 1 (CLT for £>-observables) : For all / G £>, there exists a cr? > such that 



4=£(/(*<)-m/)^v(o,o? 

where iV(0, a 2 ) denotes the normal law with mean zero and variance a 2 . 



(2.1) 



Assumption 2 (Moment bounds for ^-observables): For some subset Q C B, s > 1, 
and a G M, for all p > 1, there exists a constant C p > such that for all / G Q — Q : = 
{9i ~92 ■ 9i,92 e G}, 



E 



\ 2p' 



i=l 



< 



C P ^^||/||^log 2 ^(ll/l| B + l), 



(2.2) 



i=i 



where 



(fx \f\ s dl J ) 1 ^ s denotes the L s -norm of /. 



Both Assumption 1 and Assumption 2 have been established by many authors for a wide 
range of stationary processes. Concerning the CLT, see e.g. the three- volume monograph by 



Bradley (2007) for mixing processes, Dedecker et al. (2007) for so-called weakly dependent 



processes in the sense of Doukhan and Louhichi (1999), and Hennion and Herve (2001) for 



many examples of Markov chains and dynamical systems. Durieu (2008) proved 4th moment 



bounds of the type (2.2) for Markov chains or dynamical systems for which the Markov 



operator or the Perron- Frobenius operator acting on B has a spectral gap. It was generalized 
to 2p-th moment bounds by Dehling and Durieu (2011). More generally, they gave similar 
moment bounds for processes satisfying a multiple mixing condition, i.e. assuming that there 
exist a 9 G (0, 1) and an integer do G N such that for all integers p > 1, there exist an integer 
I and a multivariate polynomial P of total degree smaller than do such that 



\Cov(f(X i0 ) ■ ■ ■ /(X Vl ), f{X iq ) ■ ■ ■ f(X ip ))\ < 



5 P(ii - z' , 



(2.3) 
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holds for all / G 23 with [if = and ||/||oo < 1, all integers i < i\ < . . . < i p and all 
q G {1, . . . ,p}. See Theorem 4 and the examples in Dehling and Durieu| (2011 ). Note that this 
multiple mixing condition implies the moment bound (2.2) with for Q = {/ G B : ||/||oo — 1} 
and a = 
with g?o 



do — 1. Further, the spectral gap property leads to the multiple mixing condition 
= 0, and thus to the moment bound (2.2) with a = — 1, see Dehling and Durieu 



(2011) Section 4. 



We will derive a general statement about weak convergence of the empirical process 
(U n (f))f£jr under the two assumptions (2.1) and (2.2). Empirical process central limit 
theorems require bounds on the size of the class of functions J 7 , usually measured by the 
number of e-balls required to cover J 7 . Here we will introduce a covering number adapted 
to the fact that (2.1 ) and (2.2) hold only for / G B or / G Q, respectively, and that both the 



£>-norm as well as the L s (/i)-norm enter on the right hand side of the bound (2.2). In our 



approach, we use £>-brackets to cover the class J 7 , which leads to the following definition. 

Definition. Let (X, A) be a measurable space, and let /i be a probability measure on (X, A). 
Let B be some Banach space of measurable functions on X, Q c B and s > 1. 

(i) Given two functions l,u : X — > R satisfying l(x) < u(x), for all x G X, we define the 
bracket 

[l,u] : = {/ : X -> E : Z(x) < /(x) < w(x), for all ar G 
Given e,y4 > 0, we call [l,u] an (e, A, Q, L s (/x))-bracket, if l,u G Q and 

||u — i|| s < £ 

< A < A, 

where || ■ || s denotes the L s (/x)-norm. 

(ii) For a class of measurable functions J 7 , defined on A', we define the bracketing number 
N(e, A, J 7 , Q, L s (fi)) as the smallest number of (e, A, Q, L s (/x))-brackets needed to cover T . 



Our definition is close to the definition of bracketing numbers given by Ossiander (1987), 
but different. In Ossiander ( 1987), no assumptions are made on the upper and lower functions 
of the bracket other than that they are close in L 2 . Here, the moment bound (2.2) forces us 



to require the extra condition that u and / belong to the space B and that their 23-norms 
are controlled. Obviously, our bracketing numbers are always larger than the ones defined 
in Ossiander (1987), and naturally our condition on the size of T are stronger. On the other 



hand, our results apply to dependent data, while Ossiander (1987) studies i.i.d. data 



We can now state the main theorem of the present paper. The proof will be given in 
Section HI 

Theorem 2.1. Let (X,A) be a measurable space, let (Xj)j>i be an X -valued stationary 
stochastic process with marginal distribution fi, and let J 7 be a uniformly bounded class of 
measurable functions on X . Suppose that for some Banach space B of measurable functions 
on X , some subset Q C B, a el, and s > 1, Assumptions 1 and 2 hold. Moreover, assume 
that there exist constants r > —1, 7 > max{2 + a, 1} and C > such that 

-1 

e r sup N 2 (5,exp(C5~ lh ), T,G, L s (/j,))de < 00. (2.4) 

£<<K1 

Then the empirical process (U n (f))f & jr converges in distribution ini 00 ^) to a tight Gaussian 
process (w(f)) feT . 

Remark 2.2. (i) Note that the bracketing number N(8, exp(C5 -1 / 7 ), J 7 , Q, L s (fx)) might not 
be a monotone function of 5. This is the reason why we take the supremum in the integral 
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dp. 

(ii) The proof of Theorem 2.1 shows that the statement also holds if condition (2.2) is only 

(r + 1) 7 



satisfied for some integer p satisfying 

p > 

(iii) If for some r' > 0, 



7 — max{2 + a, 1} 



N(e,exp(Ce-^),T,g,L''(ji)) = 0(e- r '), 
as £ — >• 0, condition (2.4) is satisfied for all r > 2r' — 1. 

In the next section, we will present examples of classes of functions satisfying condition 



(2.4). Among the examples are indicators of multidimensional rectangles, of ellipsoids, and 
of balls of arbitrary metrics, as well as a class of monotone functions. In Section |1J we 
give applications to ergodic torus automorphisms, indexed by various classes of indicator 
functions. 

3. Examples of Classes of Functions 
In many examples that satisfy Assumptions 1 and 2, the Banach space B is the space 



of Lipschitz or Holder continuous functions, see examples in Dehling et al. (2009), Dehling 



and Durieu (2011), or Durieu and Tusche (2012). Thus, in this section, we will restrict our 



attention to the case where B is a space of Holder functions and give several examples of 
classes T which satisfy the entropy condition ( |2.4 ). 

In this section, we consider a metric space (X, d). Let a G (0, 1] be fixed. We denote by 
T-L a {X) the space of bounded a- Holder continuous functions on X with values in K. This 
space is equipped with the norm 

11/11 \ ft M j l-fO*) ~ f(v)\ 

/ a :=sup /(x) + sup — ^— . 

xax x,yex a{x,y) a 

For this section we chose B = H a {X). As the approximating class we use the subclass 
Q = n a (X, [0, 1]) := {/ G H a {X) : < / < 1} of B. Except in Example [3~5] in all examples 
we will consider the case where X is a subset of W 1 equipped with the Euclidean norm 
denoted by | • |, where d > 1 is some fixed integer. 

In most of the examples, we will use the transition function given in the following definition 
which uses the notations 



d A (x) 



inf d(x, a) 

aeA y ' 



and 



d(A,B) := inf d(a,b), 

a£A,b£B 



+ 00. 



for any element x G X and sets A, B C X, where we define inf ( 

Definition. Let A, B be subsets of X such that d(A,B) > 0. We define the transition 
function T[A, B] : X -» R by 

T[A,B](x) :=—^l—, 
1 JW d B (x) + d A (xY 

if A and B are non-empty, T[A, B] := if A = 0, and T[A, B] := 1 if B = but A ^ 0. 

Observe, that we have T[A, B](X) C [0, 1], T[A, B](x) = 1 for all x G A and T[A, B](x) = 
for all x G B. 

Lemma 3.1. For any subsets A, B of X such that d(A,B) > 0, the transition function 
T[A, B] is a bounded a-Holder continuous function and we have 
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\\T[A,B]\\ a <l + 



d(A, B) 



This lemma is proved in the appendix. 

We also use the following notations: For a non-decreasing function F from R to R, F _1 
denotes the pseudo-inverse function defined by F~ l {t) : = sup{a; G R : F(x) < t} where 
sup0 = — oo. The modulus of continuity of F is defined by 

u F {6) = sup{|F(x) - F{y)\ : \x - y\ < 5}. 

Constants that only depend on fixed parameters p±, . . . ,p k will be denoted with these pa- 
rameters in the subscript, such as c pi) ... jPfe . Furthermore the notation f(x) = O pu ___ m {g{x)) 
as x — > or x — > oo means that there exists a constant c Plv .. iPfe such that f(x) < c pi: . ^ Pk g(x) 
for all x sufficiently small or large, respectively. 

3.1. Example 1: Indicators of Rectangles. Here, we consider X = R d . In its classical 
form, the empirical process is defined by the class of indicator functions of left infinite 
rectangles, i.e. the class {l(_oo,ti : t G M. d }, where (— oo,t) denotes the set of points x such 
thalp] x < t. Under similar assumptions as in the present paper, this case was treated by 



Dehling and Durieu (2011). We will see that Theorem 2.1 covers the results of that paper. 



The following proposition gives an upper bound for the bracketing number of the larger 
class 

F = {l(t,u} -t,u G [— oo, +oo] d , t < u}, 
where (t, u] denotes the rectangle which consists of the points x such that t < x and x < u. 

Proposition 3.2. Let s > 1, 7 > 1, and let \i be a probability distribution on M. d whose 
distribution function F satisfies 

w F (x) = 0(| log(x)r s7 ) as x -» 0. (3.1) 

Then there exists a constant C = Cf > such that 

AT(e,exp(C^^),J r ,^,L s ( / u)) = O d (e- 2ds ) as e -+ 0, 

where G = n a (R d , [0,1]). 

Proof. Let e G (0, 1) and m = [6de~ s + lj . For alH G {1, . . . , d} and j G {0, . . . , m}, we 
define the quantiles 



t- ■ ■= F' 1 



ni 



where F { 1 is the pseudo-inverse of the marginal distribution function^] F^. Now, if j = 
(ji, • • • , 3d) e {0, . . . , m} d , we write 

In the following definitions, for convenience, we will also denote by or tj _2 the points 
t^o and by t itm+1 the points t^ m . We introduce the brackets [h,j,Ukj], k G {0, ...,m} d , 
j G {0, . . . , m} d , k < j, given by the a-H61der functions 

l k>j {x) := T [[t fe+1) t i _ 2 ] ) R d \ [t k , t 5 -x\] (x), 

and 

u kJ (x) := T \ [V2,*i+i]] (x), 



1 On R d , we use the partial order : x < t if and only if Xi < ti for alH = 1, . . . , d. 
2 F l (t) = /i(R x • • • x K x (-00, t] x R x • • • x K) 
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where we have used the convention that [s, t] — if s ^ t and that the addition of an integer 
to a multi-index is the addition of the integer to every component of the multi-index. 
For each k < j, we have 

\\h,j — u k,j\\ s s < ^ ([tk-2, tj+i] \ [tk+l, tj-2\) 
d 



< ^2 \Fi(thki+i) ~ Fi(ti,ki-2)\ + \Fi{ti,ji+i) - Fi{tij^ 2 )\ < 2 



i=l 



3d 



m 



and thus \\l 



u k,j\\s < £■ Moreover, since for a < b < b' < a', 

d([b,b'],R d \[a,a']) = inf inf {|a; - 6<|, K - b[\} 



i=l, 



using Lemma 3.1 and (|3.l|), we have 

II lk,j \ \a 



< 1 + 3° 

< 1 + 3° 

< 1 + 3° 



inf inf {\t iM -t 



=l,...,d 



inf |x > : 3i e {1, . . . , 4, 3t, +ar) - > 



rn 



inf < s > : c F \log(x)\ S7 > 



rn 



< 1 + 3 Q exp ^a(ci?m) s 



where is given by (3.1). The same bound holds for \\u^ 



J \\a- 



Thus, there exists a new constant Cp > such that for all k < j G {0, . . . , m} , [Zfej, u^j] 

is an (e, exp(Ci?e~^), ^, L s (/z))-bracket. It is clear that for each function / 6 J there exists 
a bracket of the form [lk,j, Ukj] which contains /. Further, we have at most (m + l) 2d such 
brackets, which proves the proposition. □ 



Notice that under the assumptions of the proposition, condition (2.4) is satisfied and 
therefore Theorem 2A may be applied to empirical processes indexed by the class of indicators 
of rectangles, taking B to be the class of bounded Holder functions. 

Corollary 3.3. Let (Xj)j>o be an M. d -valued stationary process. Let T be the class of indi- 
cator functions of rectangles in M. d and let Q = H a (¥L d , [0, 1]). Assume that, for some s > 1, 
aeR, and 7 > max{2 + a, 1}, Assumptions 1 and 2 hold, and that the distribution function 
of the Xi satisfies (3.1). Then the empirical process (U n (f))f £ jr converges in distribution in 
i 00 ^) to a tight Gaussian process. 

Remark 3.4. By regarding the class of indicator functions of left infinite rectangles as a 
sub-class of J 7 , we obtain Theorem 1 of Dehling and Durieu (2011) as a particular case of 
the preceding corollary. 

3.2. Example 2: Indicators of Multidimensional Balls in the Unit Cube. Here, we 
consider the class J 7 of indicator functions of balls on X = [0, l] d , i.e. 

J:={l BM :x6[0,l]V>0} 

where B(x,r) = {y G [0, l] d : \x — y\ < r}. We have the following upper bound. 

Proposition 3.5. Let fi be a probability distribution on [0, l] d with a density bounded by 
some B > and let s > 1. Then there exists a constant C = Cd,B > such that 

N(e,Ce- as ,F,g,L s (ii)) = O dyB {e^ d+l)s ) as e 0, 

where Q = H a ([0, l] d , [0,1]). 
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Note that the second argument in the bracketing number is different from the one appear- 
ing in the condition (2.4). In this situation we have a stronger type of bracketing number 

, id) £ {0, . . . , m} d , we denote by 



than in ([2~4l). 



Proof. Let e > be fixed and m = [s~ s \ ■ For all i 
Ci the center of the rectangle 
and j £ {0, . . . , m}, the functions 



Ci the center of the rectangle \ tJ —^, — 1 x • • • x l 1 -^, hL ]. Then we define, for i £ -f 1, 



and 



kj(x) := T 



Mij(x) := T 



B | Vd) ,[0,l] d \B 



C,;, \/d 



111 



D | r^^Vd) ,[0,l] d \B 



J + 3 



7d 



where we use the convention that a ball with negative radius is the empty set. 

By Lemma 3.1, these functions are a-H61der and, since d(B(x, r), IR d \ B(x,r')) = r' 
we have 

'3m V 



l^ijlla — 1 



<l + 3e~ 



The same bound holds for ||ttjj|| a . Since \i has a bounded density with respect to Lebesgue 
measure, we also have 



H,j u i,j\\s 



< /M B Q, 



J + 3 



r?2 



V5)\b(<,,^VS)) 



J + 3 



v 7 ^ 



m 



'-Vd 



where Cd is the constant r ^2+i) * s ^ ne g amma function). Hence, 



I <r 



as e 0, where c^b is a constant depending only on d and B. 

Now, if / belongs to J 7 , then / = iB(x,r) for some x £ [0, and < r < Vd. Thus, there 
exist some i — (ii, . . . , id) £ {0, . . . , m} d and j £ {0, . . . , m} such that 



x £ 



x • • • x 



id - 1 «d 



and 



Vd<r< l^Vd. 



m 



rn 



m m 

We then have Z$ j < / < Ujj. 

Thus, the (m + l)m d brackets Wij], i £ {1, • • ■ ,m} d and j £ {0, . . . , m}, cover the 
class J 7 . Therefore, N{c l J s B e, 4e~ as , J 7 , L s (fi)) = O^b^"^ 1 ^) as e — > 0, which implies 

that there exists a constant Cd,B > 0, for which N(e, C d) B£~ as , J 7 , Q, L s (fi)) = Od,B(e _ ^ +1 ^ s ) 
as £ ->■ 0. □ 

3.3. Example 3: Indicators of Uniformly Bounded Multidimensional Ellipsoids 
Centered in the Unit Cube. Set X = M. d . Here, we consider the class of ellipsoids 
which are aligned with the coordinate axes, have their center in [0, l] d , and their parameters 
bounded by some constant D > 0. Without loss of generality, we assume that D £ N. For 
x = (xi, . . . , Xd) £ [0, l] d and all r = (r 1; . . . , r d ) £ [0, D] d , we set 

E(x, r) := { y £ R d : ^ ^iZ^ll! < i 

i=l 
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We denote by T the class of indicator functions of these ellipsoids, i.e. 

J r :={l E ( x ,r):xE[0,l] d ,rE[0,D] d }. 
We have the following upper bound. 

Proposition 3.6. Let n be a probability distribution on R d with a density bounded by some 
B > 0. Then there exists a constant C = C d) B,D > such that 

N(e, Ce~ 2as , T, G, L s (fi)) = O d , B (e~ 2ds ) as e -> 0, 

where G = n a (R d , [0,1]). 

Proof. Let e > be fixed and m = [^~ s \ ■ For all i — (ii, . . . , id) G {0, . . . , m} d , we denote 
by h the rectangle x • • • x [i*=i, £]. Then, for i G {1, . . . , m} d and j = (j u ...,j d )e 

{0, . . . , Dm — l} d , we define the sets 



U id =\jE(x,l) = LGR d :min^^^< 



a;eJj v fc=l 

and 



m 2 



m 2 



Li j = Pi E fx, ^ = i y G M d : maxV^^ < 
' U V rn) \ f k 

We introduce the bracket [hjjUij] given by 

l itj (x) := T [Li^x,R d \ L itj ] (x) and u^{x) := T [U M+h R d \ U w ] (x), 

where we use the convention that an ellipsoid with one negative parameter is the empty set. 
By Lemma 3.1, these functions are a- Holder. Further, we have the following lemma which 
is proved in Appendix: 

Lemma 3.7. For all j G {0, . . . , Dm — l} d , x G R d , we have 

d (e (x, ^) ,R d \E (x, J ^\] > D- X m~ 2 . 



3.1 



we 



m / \ m 

As a consequence we infer that the distance between C/y and R d \Uij + i is at least D~ x m ~ 2 
and the distance between and R d \ Lij + i is at least D~ l m~ 2 . Thus, by Lemma 
have 

Whja < 1 + TD a m 2a < 1 + 3De~ 2as , 

and the same bound holds for ||ittj|| a . 

Now, to bound \\uij — hj\\ s we need to estimate the Lebesgue measures of L/y and Ly. 
Recall that, if j = (ji,---,j d ) G and x G R d , the Lebesgue measure of the ellipsoid 
E(x,j) is given by 

d 



K E ( X J)) = c dY[jk, 



k=l 

where c d is the constant fwi^f) • ^ ne se ^ can ^ e seen as ^ ne se ^ constructed as follows: 
start from an ellipsoid of parameters j/m centered at the center of Jj, cut it along its 
hyperplanes of symmetry, and shift each obtained component away from the center by a 
distance of I /2m in every direction; Uij is then the convex hull of these 2 d components (see 
Figure [l] for the dimension 2). Let us denote by Vij the set that has been added to the 2 d 
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Figure 1. C7y in dimension 2 Figure 2. L i:j in dimension 2 



components to obtain the convex hull. We can bound the volume of U^j by the volume of 
the ellipsoid plus a bound on the volume of V it j, that is 



2ji + 1 



fe=i 



m 

fc=i z^fc 



The set Ljj can be seen an the intersection of the 2 d ellipsoids of parameters j/m centered 
at each corner of the hypercube L (see Figure [2] for the dimension 2). Its volume is larger 
than the volume of an ellipsoid of parameters j/m minus the volume of Vy. We thus have 

k=i 



1,3 1 



m 

k=l l^k 



ni 



Since \i has a bounded density with respect to Lebesgue measure, we have 

< n (U iJ+ 2 \ Li j-i) 

< BX(U iij+2 ) - 5A(L iJ _ 1 ) 



I I 



We infer — Uij\ 



- cJ B {e), as e — > 0, where the constant c^b only depends on d and B. 
Now, if / belongs to F, then / = 1e(x,t) f° r some x G X, and r G [0,D] d . Thus, there 
exist some i = (ii, . . . , id) G {0, . . . , m} d and j G {0, . . . , Dm — l} d such that 



x G 



zi-l h\ 






id - 1 id \ 




x • 


• X 




m ' m y 






m 'my 



and for each 



We then have L 



— < r fe < 

m m 



"ij < / < 
Thus, the D d m 2d brackets [/ 
the class J 7 . Therefore, there exists a C^b,d > 0, such that iV(e, C<i,b,d£ 

-2ds 



z G {1, . . . , m} d and j G {0, . . . , Dm — cover 



O d;B (e- 2ds ), as £ -> 0. □ 

3.4. Example 4: Indicators of Uniformly Bounded Multidimensional Ellipsoids. 

In Example 3, we only considered indicators of ellipsoids centered in a compact subset of 
]R d , namely the unit square. The following lemma will allow us to extend such results to 
indicators of sets in the whole M. d , at the cost of a moderate additional assumption and a 
marginal increase of the bracketing numbers. 

Lemma 3.8. Let fi be a measure with continuous distribution function F, and s > 1. 
Furthermore let F := {Is : S G S}, where S is a class of measurable sets of diameter not 
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larger than D > 1, and Q = T-L a (R d , [0, 1]). Assume that there are constants p, q G N, C > 0, 
and a function f : R + — > IR + , such that for any K > we have 

N (e, f{e),F K , G, L s (»)) < CK p e~«, (3.2) 

for sufficiently small e, where Tk '■= {Is : S E S, S C [— K, K] d }. If there are some 
constants b, (5 > such that 

fi{{x e R d : \x\ > t}) < bt~?, (3.3) 

for all sufficiently large t, then 

N (e, max {/(£), 4v / rf(a; F 1 (2-( d+1 )e s ))- a } ,^,£,L s (/i)) = 0^ c>D A^ PS+q) ) as e ^ 0, 
where uf is the modulus of continuity of F. 
The proof is postponed in Appendix. 

Proposition 3.9. Let T denote the class of indicators of ellipsoids of diameter uniformly 
bounded by D > 0, which are aligned with coordinate axes (and arbitrary centers in the whole 
space Mr). If fi is a measure on M. d with a density bounded by B > and if furthermore (3.3) 
holds for some (3 > and b > 0, then there exists a constant C = C<i,b,d > such that 

N (e, Ce~ 2as , Q, L») = Op tbAB j>,a (e-^ ds ) as e ^ 0, 

where G = n a (R d , [0,1]). 



Proof. In the situation of Example 3 change the set of the centers of t he e llipsoids [0, l] d to 
[— K, K] d and apply Lemma 3.8 Following the proof of Proposition 3.6| we can easily see 
that condition (3.2) holds for p = ds, q = 2ds and f{e) = Cd,B,D£~ 2as ■ Note that since 
we have a bounded density, we have oof(x) < Bx and therefore AVd(u F 1 (2- ( ^ d+1 h s ))- a < 
A\fd(2 d+1 B) a s- as < C dtB , D e- 2as for sufficiently small e. □ 



Remark 3.10. In the situation of Proposition 3.9 for the class J 7 ' of indicators of balls in 
M. d with uniformly bounded diameter, we can obtain the slightly sharper bound 

N(e,Ce- as , F',g,L s (fM)) = O PMB>]D>s (e-^ ds+1 >) as 

for some C = C' dB > by applying Lemma 3.8 directly to the situation in Example 2 1 and 



using the same arguments as in the previous example. 

3.5. Example 5: Indicators of Balls of an Arbitrary Metric with Common Center. 

Let (X, d) be a metric space and fix a; e Af. A a; - cen tered ball is given by 

B(t) := {x E X : d(x ,x) < t}. 

We have the following bound on the bracketing numbers of the class J 7 := {1b(*) '■ t > 0}. 

Proposition 3.11. Let s > 1 and 7 > 1. If for the probability measure \x on X the modulus 
of continuity log °f the function G(t) := /x(B(t)) satisfies 

u G ( x ) = 0(| logx|" S7 ) as x -> 0, (3.4) 

then there is a constant C = Cq > such that 

N(e, exp(Ce-7), jr ; ^ l s {^)) = 0(^ s ) as e -»■ 0, 

where Q = H a (X, [0,1]). 
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Remark 3.12. Note that in the case that X 



dfi(t) = p(t)dt, the metric d is given by 



the Euclidean norm, and Xq = 0, an equivalent condition to (3.4) is 

r+x p2n 



sup 

r>0 



t 







Proof of Proposition 3.11 Fix e > and choose m 
pseudo-inverse of G and set for i G {1, ... , m} 



p(te^) dip dt = 0(| logx|- S7 ) as x 0. 

- lj . Let G^ 1 denote the 



[3e~ 



G 



-i 



in 



B, := B( n ). 

For convenience set B_!,B := and B m+1 = X. Define 

k(x) :— T [Bj_ 2 , X \ Bj_i] (x) and Ui (x) := T [B h X \B i+1 ] (x) 
The system {[ij, m] : i G {1, . . . , m}} is a covering for J 7 . Obviously 

IK - /ill* < M-Bi+i \ -Bi-a) < - <e s . 

m 



By Lemma 3.1, we have 



U; 



< 1 



3 Q 



d(Bi, X \ B i+ i) 



< 1 



3" 



(r i+1 - r t 



Since by condition (3.4) 



- r< > infix > : 3t G R such that G(t + z) - > — ) 
I m ) 

> inf {x > : 3t G R such that w G (^) > — } 

I m) 

> exp(— Cgm^) 

for some constant cg > 0, there is a constant Cq > such that 

\\ u i\\a < 1 + 3 a exp(accm^) < exp(CGW^) < exp(C<5£ _ T ; ). 

Analogously, we can show that ||/i|| a < exp(CG£~^). This implies that all [k,Ui] are 
(e, exp(Ccr£ ^, L s (/i))-brackets and thus the proposition is proved. □ 

3.6. Example 6: A Class of Monotone Functions. In this example, we choose X = R. 
We consider the case of a one-parameter class of functions T = {f t : t G [0, 1]}, where f t are 
functions from R to R with the properties: 

(i) for all t G [0, 1] and x G R, < f t (x) < 1; 

(ii) for all < s < i < 1, f s < f t ; 

(iii) for all t G [0, 1], f t is non-decreasing on R. 

Note that all the sequel remains true if in (iii), non-decreasing is replaced by non-increasing. 
Further, for a probability measure \x on R, we define G^{t) = fif t and we say that G M is 
Lipschitz with Lipschitz constant A > if \G^(t) — G^(s)\ < X\t — s\, for all s,t G [0, 1]. 

Empirical processes indexed by a 1-parameter class of functions arise, e.g. in the study of 
empirical U-processes; see Borovkova et al. (2001). The empirical U-distribution function 
with kernel function g(x,y) is defined as 



U n (t) 



(2) Ki 



V {g{x l ,x j )<t}- 



<j<n 
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Then, the first order term in the Hoeffding decomposition is given by 

n 
i=l 

where gt(x) = P(g(x,Xi) < t). For this class of functions, conditions ^ and ^ are 
automatically satisfied. Condition (iii) holds, if g(x,y) is monotone in x. This is e.g. the 
case for the kernel g(x,y) = y — x, which arises in the study of the empirical correlation 
integral; see Borovkova et al. (2001). 

Proposition 3.13. Let s > 1 and 7 > 1. Let fi be a probability measure on R such that its 
distribution function F satisfies 

co F ( x ) = 0(1 log(x)T 7 ) as x -> 0, (3.5) 

and such that is Lipschitz with Lipschitz constant A > 0. Then there exists a C = Cf > 0, 
such that 

N(e, exp(Ce~^) } J 7 , Q, L s (/i)) = O x (e~ s ) as e -> 
where Q = U a {% [0,1]). 

Proof. Let e > and m = [(A + 4)s~ s + lj . For i — 0, . . . , m, we set 

i 1 ( i 

tj = — and Xj = F 



m \m 



We always have x m = +00, but xq could be finite or —00. In order to simplify the notation, 
in the first case, we change to xq = — 00. 

We define, for j £ {1, . . . , m}, the functions lj and Uj as follows. If k £ {1, . . . , m — 1}, 
we set /j(xfc) = ft-_i{xk-i) an d Uj(xk) = ft,(xk+i), where we have to understand /(±oo) as 
\im x ^ ±oc f(x). If k £ {0, . . . , m — 1} and x £ (2^, 2^4.1), we define lj(x) and tij(x) by the 
linear interpolations, 

lj( x k+l) ~ lj( x k) 



h( x ) = l j( x k) + ( x - x k)' 



> 

x k+l ~ x k 



( \ f \ , f ,Uj(x k+1 ) - Uj(x k ) 

x k+l ~ x k 

with the exceptions that lj(x) = lj( x i) = /t J -_ 1 (— 00) if x £ (—00, Xi) and Uj(x) = Uj(x m -x) = 
ft (+00) if x £ (x m _i, +00). Then it is clear that for all tj-\ <t< tj, we have lj < f t < 
i.e. / t belongs to the bracket [lj,Uj]. 

Further, being piecewise affine functions, lj and Uj are a-H61der continuous functions with 
Holder norm 



L a < 1 + max — — J -^— — - < 1 + max - — < 1 + exp C F m s 

k=l,...,m (x k - Xk-i) a k=l,...,m (x k - Xk-i) a 



Here we have used the condition (3.5 ) and the same computation as for the class of indicators 
of rectangles. Analogously, the same bound holds for 
Now, 

\\Uj - lj\\ S s < \\Uj - 1^ < \\ Uj - / t .||i + \\f tj - 4_J|i + \\lj - 4.^ ||i. 
First, since is Lipschitz, we have 

114 - ||i < G(tj) - G(t,_i) < \{t 3 - t^t) = ^. 
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For x e [:Efc_i, x fc ], since f t is nondecreasing, we have Uj(x) < f t . {x k+1 ) and u tj (x) > f t . 
thus 



m—1 



/tjl < Yl \ftj( x k+i) - /<*(z*-l)IM[Zk>Zft+i]) 
fe=l 

m—1 

^ - - 4(^)1 + \hM - 4(^.-i)l) 



fc=i 

m—1 



m 

< — 
m 

ym—l 



k=0 



since, by monotonicity, Y!k=o \ft ] {x k+ i)- ft 3 {x k )\ < 1. In the same way we get H^-Zt^Jli < 
— and we infer 



I t^j I J ' | | S — 



A + 4 



m 



l/s 



< e. 



Thus, the number of (e, exp(CVe ~> ), Q, L s ([i) )-brackets needed to cover the class T is bounded 
by m, which proves the proposition. □ 

4. Application to Ergodic Torus Automorphisms 



We can apply Theorem 2.1 to the empirical process of ergodic torus automorphisms. Let 

rjd _ jjd k e torus of dimension d > 1, which is identified with [0, l] d . If A is a square 

matrix of dimension d with integer coefficients and determinant ±1, then the transformation 
T . jd — y T d defined by 

Tx = Ax mod 1 

is an automorphism of T d that preserves the Lebesgue measure A. Thus (T d , B(T d ), A, T) 
is a measure preserving dynamical system. It is ergodic if and only if the matrix A has no 
eigenvalue which is a root of unity. A result of Kronecker shows that in this case, A always has 
at least one eigenvalue which has modulus different than 1. The hyperbolic automorphisms 
(i.e. no eigenvalue of modulus 1) are particular cases of Anosov diffeomorphisms. Their 
properties are better understood than in the general case. However, the general case of 
ergodic automorphisms is an example of a partially hyperbolic system for which strong 
results can be proved. The central limit theorem for regular observables has been proved 



by Leonov (1960), see also Le Borgne (1999) for refinements. Other limit theorems can be 
found in Dolgopyat (2004). The one-dimensional empirical process, for R- valued regular 



observables, has been studied by Durieu and Jouan (2008). Dehling and Durieu (2011) 



proved weak convergence of the classical empirical process (indexed by indicators of left 
infinite rectangles). We can now generalize this result to empirical processes indexed by 
further classes of functions. We can get the following proposition, as a corollary of Theorem 



2.1 and the results of the preceding section. 



Theorem 4.1. Let T be an ergodic d-torus automorphism and let T be one of the following 
classes: 

• the class of indicators of rectangles of T d ; 

• the class of indicators of Euclidean balls ofT d ; 

• the class of indicators of ellipsoids of bounded diameter of T d ; 
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Then the empirical process 



Un 



1 - 

V 1=1 



converges in distribution in i 00 ^) to a tight Gaussian process (W(/))/ e jr. 

Proof. Let T be one of the classes of functions and B be the class of a-H61der functions for 
some a > 1/2. We set Q the subclass of B given by the functions bounded by 1. We consider 
the T d - valued stationary process Xj = T\ Since the distribution of X is the Lebesgue 



measure on T d , Propositions 3.2 3.5 and 3.6 show that the condition (2.4) holds for every 



possible choice of class J 7 . For all / G B, the central limit theorem (2.1) holds; see Leonov 



(1960 


) and 


Le Borgne 


( 


1999 


)■ 



Dehling and Durieu (2011), Proposition 3, show that the 



ergodic automorphisms of the torus satisfy the multiple mixing property (2.3) for functions 
of the class Q, and with the constants i = 1 and do the size of the biggest Jordan's block 
of T restricted to its neutral subspace. Thus the 2p-th moment bound (2.2) holds, and 
Theorem |2.1| can be applied to conclude. □ 



5. Proof of the Main Theorem 



In the proof of Theorem 2.1 we need a generalization of Theorem 4.2 of Billingsley (1968). 
Billingsley considers random variables X n , X^, X^ m \ X, m,n > 1, with values in a 
separable metric space (S,p) satisfying (a) X^^^X^ as n — > oo, for all m > 1, (b) 

c) V<5 > 0, limsup^^ P(p(X^ \ X n ) > 5) — > as m — > oo. 



X( m )J?-^x as m — t- oo and 



Theorem 4.2 of 
that this result 



v 



Billingsley (1968) states that then X n — >X. 



Dehling et al. ()2009) proved 
e metric 



lolds without condition (b), provided that 5 is a complete separab 
space. More precisely, they could show that in this situation (a) and (c) together imply the 

existence of a random variable X satisfying (b), and thus by Billingsley's theorem X n — >X. 
Here we will generalize this theorem to possibly non-measurable random elements with values 
in non-separable spaces. Regarding convergence in distribution of non-measurable random 



elements, we use the notation of van der Vaart and Wellner (1996). In accordance with the 



terminology of van der Vaart and Wellner (1996), we will call a not necessarily measurable 



function with values in a measurable space a random element. 

Theorem 5.1. Let X n , xi m \ X^ m \ m, n > 1, be random elements with values in a com- 
plete metric space (S,p), and suppose that X^ m ' is measurable and separable, i.e. there is a 
separable set S {rn) C S such that P(X^ e 5 (m) ) = 1. If the conditions 



n 

\imsupP*(p(X n ,X^)>5) 



v 



X (m) as n ->• oo, for all m > 1, (5.1) 
as m — >■ oo, for all 5 > (5.2) 



are satisfied, then there exists an S-valued, separable random variable X such that X^ 
X as m — » oo, and 



Xr, 



V 



X as n — >■ oo. 



The proof is postponed to the Appendix. 



Proof of Theorem 2J_. For all q > 1, there exist two sets of N q := N(2 q , exp(C2 
functions {g q ,i, . . . , g q ,N q } C Q and {g' q l , . . . , g' q N } C G, such that - g' q i 



(5.3) 



II < 2-9, 



APPROXIMATING CLASS APPROACH FOR EMPIRICAL PROCESSES 



17 



Ibg.ills — exp(C2 ; '), HPqjlls < exp(C2^) and for all / G J 7 , there exists an i such that 
9 q ,l < f < 9' q ,i- Further, by Q, 



^ 2 -( r+1 ^iV 3 2 <+oo. 

9>1 



(5.4) 



For all q > 1, we can build a partition J 7 = (Ji=i ^~<?,« °f the c l &ss into N q subsets such 

that for all / G 7"^, # g>i < / < g' qi . To see this define jF q>x = [g q ,i,g' ql } and T q ^ = 

In the sequel, we will use the notation 7i q f = g q<i and tt' f = g' qi if / G J-^j. For each 
g > 1, we introduce the process 



1 " 

^ ) (/):=^(vrJ) = -^Vra; /G^ 

i=l 



which is constant on each T q ^. Further, if / G J- Qt i, we have 

F^(f)<F n (f)<F n (n' q f) 

We introduce 

ui q) (f) ■= u n (nj) = MFi q) (f) - MV)); 

Proposition 5.2. For a// g > 1, i/ie sequence (Un\f))f & jr converges in distribution in 
£°°(F) to a piecewise constant Gaussian process {U^ q \f))f e jr as n — >■ oo. 



Proof. Since 7T g / G and £ is a subset of £>, by assumption (2.1 ), the CLT holds and Un\f) 
converges to a Gaussian law for all / G F. We can apply the Cramer- Wold device to get the 
finite dimensional convergence: for all k > 1, for all fi,---,fk £ J~, (ujr(fi), . . . , Un\fk)) 
converges in distribution to a Gaussian vector (U^(fi), . . . ,U^(fk)) in IR fc . Since Un^ is 
constant on each element T q ^ of the partition, the finite dimensional convergence implies 
the weak convergence of the process. Indeed, consider the function r q : M> Nq — > £°°(F) 
that maps a vector x = (xi, . . . ,Xn) to the function F — > E, / i— > Xi such that / G F q ^. 

For h G F q>1 ,...J Nq G T qjNq we have = T q {U$\h), ■ ■ ■ , uP(f Nq )) and thus the 

continuous mapping theorem guarantees that Un^ converges weakly to the random variable 



T, 



(U^ q '(fi), . . . , U^(fN v )) which is constant on each F q ^. 



□ 



Proposition 5.3. For all e > 0, rj > there exists a qo such that for all q > qo 

limsupP*(sup \U n (f) - U%\f)\ >e)< V . 

Proof. For a random variable Y let Y denote its centering Y := Y — EY\ If for arbitrary 
random variables Yi,Y,Y u we have Y\ <Y <Y U then 

W~%\ < lYu-Ytl+ElYu-Yil 

Using Fk q+K \f) < F n (f) < F n (n' q+K f) and E\F n (ir' g+K f) - F^ +K \f)\ < 2"^ for all 
/ G F, we obtain 



\UnU)-U%\f)\ 



K 



J2 U n +k) U) ~ Ui^U) + U n (f) ~ Ui q+K \f) 



k=l 



< 



fj2\uL q+k \f) - U^-'Wl + \u n (ir' q+K f) - U£+ K \f)\\ + 02r<<**>. 
^fc=i > 
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In order to assure § < 2~( q+K ^ y/n < |, for fixed n and g, choose K = K UyQ) where 



K, 



n,q 



log (^) 108(2)- 



For each i e {!,..., iVq}, we obtain 



SUp \U n (f)-U^(f)\ < J2 S ^\ U n +k) (f)- U ^ 1] (f)\ 

f£J~ q,i k=l f^-^ 

+ sup \U n (ir' q+Kn J)-U^Xf)\ + £ -. 
By taking e k = 4k{ £ k+1) , J2k>i £ k = f and we get for each i e {1, . . . , iVj, 

P* sup \U n (f) - E#>(/)| > e < J2 P * SU P l^ 9+fe) (/) - U<*+*-V(J)\ > 



-A' 



k=l 



+P* SUp |f/ n « + ^ 9 /) " U^\f)\ > - 

Notice that the suprema in the r.h.s. are in fact maxima over finite numbers of functions, 
since the functionals ir q and it' (and thus ui q) ) are constant on the J- q ^. Therefore we 
can work with standard probability theory from this point: the outer probabilities can be 
replaced by usual probabilities on the right hand side. For each k choose a set F k composed 
by at most Nk-iNk functions of J 7 in such a way that F k contains one function in each non 
empty Tk-ij H J~k,ji i — 1, ■ ■ ■ , Nk-i, j = 1, . . . , N k . Then, for each i G {1, . . . , N q }, we have 

/ \ K n , q 

P* sup \U n (f) - U®(f)\ >e) < J2 E P (I^H/) - U^(f)\ > e k ) 

\ /e - F? ' i / k=l f<=F q ,inF g+k 

+ Yl P{PnK + K n J)-U^\f)\>l). 

feJ r q ,ir\F q+Knq 



Now using Markov's inequality at the order 2p (p will be chosen later) and assumption (2.2) 
we infer 



P* ( sup \U n (f)-U^(f)\>e 



K n,q V 

^ °p E E —p E ni_p n w - *«+*-i/ui \o g 2p+a ^h q+k f - it q+k -xf\\ B + 1) 



k=l feT q ,ir\F q+k °k j=i 



Crt 



u\ 2p p 



At this point, without loss of generality, we can assume that a > — 1 (if not, take a larger a) 
and thus the assumption on 7 reduces to 7 > 2 + a. 
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Note that by construction, for each k > 1, 

IIW-^-i/ll* < h q+k f - f\\s + h q+k -if - f\\ s <^-^ (q+k) 



q + k , 



hq+kf - TTq+k-lfh < 2eXp(C2^) 

q-\-k 

K+kf-K+kfh < 2exp(L72 — ). 

Thus, 



P* ( sup \U n (f) - U<*\f)\ > e 



< 2 2P+1 ^E E #(-^ n ^ +fc ) (Mfe t, 1))2 ^ J " P2 " j(g+fc) log 2 ^(2exp(L72^) + 1), 



j=i fc=i 

and if q is large enough, 



P* ^sup|C/ n (/)-C/W(/)|>£^ 

sup |tf„(/) -I#>(/)| >e 
<^EEE n ^) (fc(fc t 1)) ^ n^2-^)2^)^, 



£ 2p 
i=l j=l fc=l 

where D is a new constant which depends on p, C, and C p . Since (^,1)1=1,...,^ is a partition 
of J 7 , we have 

i=l 

thus we have 



P* (s^|tf B (/)-tfM(/)|>e) 



^ ^ E ^ E ^^^^2^^)^ 
i=i fc=i 

< £ !g_! 2 (p^-)(7 + 2 + a)^ g ^ +fc _ iiV(?+fcA;4p2 ((-^ 7 ) p+ ( 2+2a ),)^ 
i=l fc=l 

p-l (j-p) 7 ~ (2+a) 00 n/ 00 

^ E ^Sh? E jw********-*"? + 1 E w^**? , 

j=i £ 7 fc=i fc=i 

(5.5) 



because a > — 1 and thus (2 + 2a)j < (2 + 2a)p, and where D' and D" are positive constants 

t-q- 
7 



also depending on p, C, and C p . As p 2+a 7 — > —00 when p tends to infinity, there exists 
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some p > 1 such that p 2+a ~ 7 < — (r + 1) and thus by (5.4), 



J2N k ^N k k 4p 2^ 2+a -^ < ^iV fc 2 _ 1 ^2 p(2+a - 7) HX]^^ 4p 2 p(2+a " 7) - < +oo. 



k=2 



k=2 



k=2 



Therefore the first summand of (5.5) goes to zero as n goes to infinity and the second 
summand of (5.5) goes to zero as q goes to infinity. □ 

establish for the random elements U n , Un \ with value in the 



Propositions 



5.2 



and 



5.3 



complete metric space ^(J 7 ) conditions (5.1) and (5.2) of Theorem 5.1, respectively. Thus 
Theorem 5.1 completes the proof of Theorem 2.1 □ 



Appendix 

Proof of Lemma 3.1\ By the triangle inequality, we have for all x, y G X that 

\d B (x) - d B (y)\ < d(x,y) 
d B (x) + d A (y) < d(A, B). 

Therefore, 

\T[A,B](x)-T[A,B](y)\ 
(d B (x) - d B (y)) (d B (y) + d A (y)) + d B (y){d B (y) + d A (y)) - d B (y){d B (x) + d A (x)) 



(d B (x) + d A (x)) (d B (y) + d A (y)) 



d B (x) - d B (y) 



< 3 



d B (x) + d A (x) 
d(x,y) 



+ 



dii{y) 



d B (y) + d A (y) 



(d B (y) - d B (x)) + (d A (y) - d A (x)) 



d B (x) + d A (x) 



d(A, B) 
and thus 

\\T[A,B]\\ a := \\T[A, BjW^+suv 



\T[A,B}(x)-T[A,B](y)\ 



< 1 + sup 



< 1 + 



d(x,y) c 

T[A,B](x)-T[A,B](y)\ 
d(x,y) 



\T[A,B]{x)-T[A,B](y) 



l-a 



d{A,B) 



□ 



Proof of Lemma 3.1. Without loss of generality, assume that x = 0. For v e M. d , let D„ 
denote the diagonal dxd- matrix with diagonal entries vi, . . . ,Vd- We define the operator norm 



of the cixd-matrix A by \A\* 
We can characterize E(0, 



sup ye 
and R d \ E(0 



Observe that |D„|* = max^i^..^ \vi 



+ 



by 
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respectively. Thus, for any z E E (0, — ) and y E W 1 \ E (0, + — ) . 



\y-z\> 

> 
> 



D 



-l 

77i m 



I) ; y I) / : I) I) ': 



m m 



m 1 m 



-i 



m m 



m 1 m 



mm 

i=l,...,c 



-1 



J 1 
— + — 
777, 777 



D" 1 ,D 

— + — 

mm' 



1 — max 



H 

m 



i=i,...,d | i + i 

m m 



< 



Dm 2 



since j\ E {0, . . . , Dm — 1}. 



□ 



Proo/ of Lemma \3^ For any e > 0, set K £ = sup{K > : fi([—K, K\ d ) < 1 — e}. We will 
denote the function (0, 1) — > IR + , e i— > K £ by Now, introduce the bracket [L, t/" e ], given 
by 



L = 



and 



C/ £ := T [R d \[-K eV2) K £V2 ] d , [-K £S ,K £S ] d ] 



Obviously, we have \\U e — L\\ s < \\U e — L\\± < e. 

To get a bound for the Holder- norm of U £ , consider the distribution function 

G(t) := n({x E R d : |x| max < t}) 

on R, where |x| max = max{|xj| : i — 1, . . . , c?}. Observe that the pseudo-inverse G" 1 of G is 
linked to K, by the equality = — e). With geometrical arguments we infer 

G(t)= J2 *0')*m 

je{-i,i} d 



where a(j) := Yli=i3i e { — 1 , 1}- Therefore 

co G (x) = sup{C7(t + x)- G(t)} = sup V <r(j) (F((t + - )) 

< ^ sup|F((t + x)j)-^j)| < M^x) 
je{-i,i} d teR is{-i,i} d 

< 2 d u F (\fdx). 
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Now by Lemma 3.1 we obtain 

\\U e \\ a < 1 + 



3° 



2 



< 1 + 3 a inf <^ x > : 3t G R such that G(t + x) - G(t) > 



< 1 + 3 a inf \ x > : co G (x) > 



< 1 + 3° ( sup <^ x > : u F (Vdx) < 



2 d-\ 



1 + (3Vd) a (oo F 1 (2- {d+1) e s ))- a , 



where we used that up is continuous here to replace the infimum by the supremum. 

Then [L, U £ \ is an (e, AVd(u F l (2-^ d+1 h s )y a , G, L s (/i))-bracket for sufficiently small e. Since 

[L, U e ] contains any / e F\Fk b/2 +d, by (3.2) we obtain for all those e the bound 

N (e, max {/(e), 4v / rf(w F 1 (2-( d+1 )£ s ))- a } , T, Q, L 5 ^)) < C(K £S/2 + D)' p E~ q + 1. 

Let us finally consider the growth rate of K £ s/ 2 as e — > 0. By assumption (3.3) and since 
I ■ |max < | ■ |, we have 1 — G(t) < bt^ 1 ^ for sufficiently large t. Therefore, 

G{{b/ef) >l-e. 

By the definition of K m , we therefore obtain that K £ s/ 2 < {2b/e s Y = Opj,(£~P s ) which proves 
the lemma. □ 



Proof of Theorem 5A_. (i) We will first show that converges in distribution to some 

random variable X. We denote by LS™^ the distribution of X^; this is defined since 
X( m ) i s measurable. Moreover, L^ m ' is a separable Borel probability measure on S. By 
Theorem 1.12.4 of van der Vaart and Wellner (1996), weak convergence of separable Borel 
measures on a metric space S can be metrized by the bounded Lipschitz metric, defined by 



d BLl (L u L 2 ) = sup 



f{x)dL l (x) 



f(x)dL 2 (x) 



(5.6) 



for any Borel measures Li,L 2 on S. Here, BLi := {/ : S 



BLi < 1}, where 



bu ■= max < sup \f(x) , sup r — >. 

p{x,y) J 



xyiyeS 



In addition, the theorem states that the space of all separable Borel measures on a complete 
space is complete with respect to the bounded Lipschitz metric. Thus it suffices to show 
that L < - m - ) is a <i Bil -Cauchy sequence. We obtain 

cW^ (m U (0 )= sup \Ef(X^)-Ef(X^)\ 
feBLi 



< sup \\Ef(X^) - E*f(X^)\ + p/(4 m) ) " W») 



+ \E*f(X n ) - E*f(X®)\ + |E*/(X«) - Ef{X 
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for all n G N. For a Borel measurable separable random element weak convergence 



Xj m) A as n -+ o o is eq uivalent to sup fgBLl |E/(X^) - E*/(x£ 



(m) n 



0; see 



van der Vaart and Wellner (1996, p. 73). Hence by (5.1) we obtain 



d BLl (L^,L^) < liminf sup \E*f(X^) - E*f(X n )\ + \E*f(X n ) - E*/(X«)|. 

n->oo f eB L! 



Using Lemma 1.2.2 (iii) in van der Vaart and Wellner (1996), we obtain 

^ - E*f(X n )\ < E(\f(X n ) - f(X^)\*) 



|E*/(X 



and therefore 



sup \E*f(Xjr y )-E*f(X n )\ < E(p(X n ,X( m ))A2)' 



POO 

/ P*(p(X n ,Xt ) )A2>t) dt, 
Jo 



(5.7) 



where we used the last statement of Lemma 1.2.2 in van der Vaart and Wellne~r| ( 1996[ ). Now, 
let e > be given. By (5.2), there exists an mo G N such that for every m > mo there is 
some no G N such that for every n > no we have P*(p(X n , X^) > e/3) < e/3. Therefore 



P*(p(X n ,X( m ))A2>t) < 



£ 

3' 



iff<§ 

if | < t < 2 



0, if 2 < t. 



Applying this inequality to (5.7), we obtain 



liminf sup \E*f(X^) - E*f(X n )\ < f \ + l {t<| } dt = e 

n->oo f^BLi Jo O 



for all m > m . Hence for /,m > m we have d BLl {L^ m \L^) < 2e; i.e. (L (m) ) meN is a 
^BLi-Cauchy sequence in a complete metric space. 



(ii) The remaining part of the proof follows closely the proof of Theorem 4.2 in |Billingsley 



(1968), replacing the probability measure P by the outer measure P* where necessary and 
making use of the Portmanteau theorem: sec van der Vaart and Wellner (1996). Theorem 
1.3.4 (iii), and the sub-additivity of outer measures. From part (i), we already know that 

there is some measurable X such that X*"^ — ^> X. Let F C S be closed. Given e > 0, we 
define the e-neighborhood F e := {s G S : mi x£ p p(s,x) < e}, and observe that F e is also 
closed. Since {X n G F} C {X { n m) G F £ } U {p(xi m \x n ) > e}, we obtain 

P*(X n G F) < P*(X( m > G F £ ) + P*(p(X( m \X n ) > e), 
for all m G N. By (5.2) we may choose mo so large that for all m > m 

limsupP*(p(X^ m) , X n ) >e)< e/2. 

As X*> m ) V, by the Portmanteau theorem we may choose mi so large that for all m > m\ 

P(X (m) G F £ ) < P(X G F £ ) + e/2. 

We now fix m > max(mo,mi). By (5.1) we have X^ — ^> X*" 1 ) as n — > oo. Thus an 
application of the Portmanteau theorem yields 

limsupP*(X^ m) G F £ ) < P(X (m) G P £ ), 

n— >oo 

limsupP*(X n G F) < P(X G P £ ) + e. 
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Since this holds for any e > and lim^o P(X G F £ ) = P(X G F), we get 

limsupP*(X n G F) < P(X G F), 

for all closed sets F C S. By a final application of the Portmanteau theorem we infer 
X n A X. □ 
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